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A large number of complex systems find a natural abstraction in 
the form of weighted networks whose nodes represent the elements 
of the system and the weighted edges identify the presence of an 
interaction and its relative strength. In recent years, the study of 
an increasing number of large scale networks has highlighted the 
statistical heterogeneity of their interaction pattern, with degree 
and weight distributions which vary over many orders of magnitude. 
These features, along with the large number of elements and links, 
make the extraction of the truly relevant connections forming the 
network's backbone a very challenging problem. More specifically, 
coarse-graining approaches and filtering techniques are at struggle 
with the multiscale nature of large scale systems. Here we define 
a filtering method that offers a practical procedure to extract the 
relevant connection backbone in complex multiscale networks, pre- 
serving the edges that represent statistical significant deviations with 
respect to a null model for the local assignment of weights to edges. 
An important aspect of the method is that it does not belittle small- 
scale interactions and operates at all scales defined by the weight 
distribution. We apply our method to real world network instances 
and compare the obtained results with alternative backbone extrac- 
tion techniques. 
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In recent years, a huge amount of data on large scale so- 
cial, biological, and communication networks, meticulously 
collected and catalogued, has become available for scientific 
analysis and study. Examples can be found in all domains; 
from technological to social systems and transportation net- 
works on a local and global scale, and down to the microscopic 
scale of biochemical networks [TJ[2l[3]. Common traits of these 
networks can be found in the statistical properties character- 
ized by large scale heterogeneity with statistical observables 
such as nodes' degree and traffic varying over a wide range 
of scales j4j. The sheer size and multiscale nature of these 
networks make very difficult the extraction of the relevant 
information that would allow a reduced representation while 
preserving the key features we want to highlight. A typi- 
cal example is faced in the visualization of networks. While 
it is generally possible to create wonderful images of large 
scale heterogeneous networks, the amount of valuable infor- 
mation gathered is in most cases very little because of the 
redundant intricacy generated by the overwhelming number 
of connections. Problems such as the extraction of the rel- 
evant backbone or the isolation of the statistically relevant 
structures/signal that would allow reduced but meaningful 
representations of the system are indeed major challenges in 
the analysis of large-scale networks. 

In complex weighted networks, the discrimination of the 
right trade-off between the level of network reduction and the 
amount of relevant information preserved in the new repre- 
sentation faces us with additional problems. In many cases, 
the probability distribution P(cij) that any given link is carry- 
ing a weight uj is broadly distributed, spanning several orders 
of magnitude. This feature implies the lack of a characteris- 
tic scale and any method based on thresholding would simply 
overlook the information present above or below the arbitrary 



cut-off scale. While this issue would not be a major drawback 
in networks where the intensities of all the edges are indepen- 
dently and identically distributed, the cut off of the P{i^) tail 
would destroy the multiscale nature of more realistic networks 
where weights are locally correlated on edges incident to the 
same node and non-trivially coupled to topology 5 . Thus, 
the presence of multiscale fluctuations calls for reduction tech- 
niques that consistently highlight the relevant structures and 
hierarchies without favoring any particular resolution scale. 
Furthermore, it also demands to change the focus towards a 
local perspective rather than a global one, where the relevance 
of the connections could be decided at the level of nodes in 
relative terms. 

In this work, we concentrate on a particular technique 
that operates at all the scales defined by the weighted net- 
work structure. This method, based on the local identification 
of the statistically relevant weight heterogeneities, is able to 
filter out the backbone of dominant connections in weighted 
networks with strong disorder, preserving structural proper- 
ties and hierarchies at all scales. We discuss our multiscale 
filter in relation to the appropriate null model that provides 
the basis for the statistical significance of the heterogeneity 
measurements. We apply the technique to two real world net- 
works, the U.S. airport network and the Florida Bay food web, 
and compare the results to those obtained by the application 
of thresholding methods. 



Results and Discussion 

In Statistical Mathematics, as in other areas, filtering tech- 
niques aimed at uncovering the relevant information in data 
sets are popular and successful. One could cite, for in- 
stance, the Principal Components Analysis to identify hidden 
patterns by reducing the effective dimension of multivariate 
data [6]. In the following, we will refer to the network reduc- 
tion as the construction of a network that contains far less 
data (in our case links) and allows the discrimination and 
computational tractability of the relevant features of the orig- 
inal networks; for instance, the traffic backbone of a large 
scale transportation infrastructure. Reduction schemes can 
be divided into two main categories: coarse-graining and fil- 
tering/pruning. In the first case, nodes sharing a common at- 
tribute could be gathered together in the same class -group, 
community, etc.- and then substituted by a single new unit 
which represents the whole class in a new network representa- 
tion of the system [7| ISI El ES] ■ This coarse-graining is indeed 
zooming out the system so that it can be observed at different 
scales. Something completely different is done when a filter 
is applied. In this case, the observation scale is fixed and the 
representation that the network symbolizes is not changed. 
Instead, those elements -nodes and edges- that carry rele- 
vant information about the network structure are kept while 
the rest are discarded. An example of a well-known hierar- 
chical topological filter, although usually not referred as such, 
is the fc-core decomposition of a network JJJ, with a filtering 
rule that acts on the connectivity of the nodes. 
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In the case of weighted networks [5], two basic reduction 
techniques refer to the extraction of the minimum spanning 
tree and the application of a global threshold on the weights 
of the links so that just those that beat the threshold are pre- 
served. The minimum spanning tree of a graph Q, a classical 
concept of graph theory [T^], is the shortest length tree sub- 
graph that contains all the nodes of Q. These definitions can 
be generalized for weighted graphs [13] • A minimum spanning 
tree of a weighted graph Q is the spanning tree of Q whose 
edges sum to minimum weight. This idea has been exploited 
along with percolation criticality to define superhighways in 
weighted networks 14 . By using opportune transformation 
rules for the weights, it is also possible to define maximum 
weighted spanning trees and other analogous definitions. One 
of the big limitations of this method is that spanning trees are 
by construction acyclic. This means that reduced networks 
obtained by this algorithm are overly structural simplifica- 
tions that destroy local cycles, clustering coefficient and the 
clustering hierarchies often present in real world networks. 

These previous drawbacks are not present in the appli- 
cation of a threshold to the global weight distribution that 
removes all connections with a weight below a given value ljc- 
This filter has been used for instance in the study of functional 
networks connecting correlated human brain sites [15j and 
food web resistance as a function of link magnitude [TB] . This 
approach, however, belittles nodes with a small strength s (de- 
fined as the sum of weights incident to the node Si = Wij), 
since the introduction of uic induces a characteristic scale from 
the outset. As a consequence, strongly disordered networks 
with heavy-tailed statistical distributions P{s) and P{uj) make 
this simple thresholding algorithm very poorly performing 
since nodes with small s are systematically overlooked. This 
is even a more serious drawback when weights are correlated 
at the local level. In this type of networks, interesting features 
and structures are present at all scales and the introduction of 
such artificial cut-off drastically removes all information below 
the cut-off scale. 



Local fluctuations. In order to develop a multiscale reduc- 
tion algorithm, we take advantage of the local fluctuations 
of weights on the links emanated by single nodes. In hetero- 
geneous weighted networks with strong disorder, i.e. heavy 
tailed P{(^) and P{s) distributions, a few links carry the 
largest proportion of the node's total strength. Furthermore, 
most real networks have nodes surrounded by incident edges 
with associated weights that are heterogeneously distributed 
and correlated between them. The fingerprint of these cor- 
relations is observed in the non-trivial dependence between 
weights and topology [5]. The better a node is connected to 
the rest of the network, the higher the weight of its edges 
so that the strength tends to grow superlinearly with the de- 
gree. However, the strength alone is not enough to capture the 
weighted structure of nodes even at the local level. We need 
to introduce some measure of the fiuctuations of the weights 
attached to a given node, and we want to do it at the local 
level in relative terms so that each node could independently 
assess the importance of its connections. To this end, we first 
normalize the weights of edges linking node i with its neigh- 
bors as pij = uiij/si, being Si the strength of node i and Wij 
the weight of its connections to its neighbor j. Then, by using 
the disparity function defined in the Materials and Methods 
section, it is possible to see that even at the local level defined 
by the edges adjacent to a single node a few of those edges 
carry a disproportionate fraction pij of the node's strength. 



with the remaining edges carrying just a small fraction of the 
node's strength |17llS]. 

Being more specific, we are interested in all edges with 
weights representing a significant fraction of the local strength 
and weight magnitude of each given node. However, local 
heterogeneities could simply be produced by random fluctua- 
tions. It is then fundamental to introduce a null model that 
informs us about the random expectation for the distribution 
of weights associated to the connections of a particular node. 
Empirical values not statistically compatible with the null 
model define, on a node by node basis, whether the observed 
weight heterogeneity and intensity are statistically significant 
and define the relevant part of the signal due to specific and 
relevant organizing principles of the network structure. This 
procedure would determine without arbitrariness how many 
connections for every node belong to the backbone of con- 
nections that carry a statistically disproportionate weight -be 
them one, zero or many-, providing sparse subnetworks of con- 
nected links selected according to the total amount of weight 
we intend to characterize. This reduction scheme necessarily 
encodes a wealth of information as the reduced network does 
not contain only the links carrying the largest weight in the 
network but also all links which can be considered, according 
to a pre-defined statistical significance level, defining the rel- 
evant structure (signal) generated by the weight and strength 
assignment with respect to the simple randomness of the null 
hypothesis. An important aspect of this construction is that 
the ensuing reduction algorithm does not belittle small nodes 
in terms of strength and then offers a practical procedure to 
reduce the number of connections taking into account all the 
scales present in the system. 

The disparity filter. In the following, we discuss the dispar- 
ity filter for undirected weighted networks, although it is also 
applicable to directed ones as reported in the Supporting In- 
formation. The null model that we use to define anomalous 
fiuctuations provides the expectation for the disparity mea- 
sure of a given node in a pure random case. It is based on 
the following null hypothesis: the normalized weights which 
correspond to the connections of a certain node of degree k 
are produced by a random assignment from a uniform distri- 
bution. To visualize this process, — 1 points are distributed 
with uniform probability in the interval [0, 1] so that it ends 
up divided in k subintervals. Their lengths would represent 
the expected values for the k normalized weights pij according 
to the null hypothesis. The probability density function for 
one of these variables taking a particular value x is 

p{x)dx = (k — — x)''~'^dx, [1] 

which depends on the degree k of the node under consider- 
ation. In the Material and Methods section we provide a 
detailed analysis of the null model with respect to the actual 
weight distribution in two real world networks. 

The disparity filter proceeds by identifying which links for 
each node should be preserved in the network. The null model 
allows this discrimination by the calculation for each edge of 
a given node of the probability Uij that its normalized weight 
Pij is compatible with the null hypothesis. In statistical in- 
ference, this concept is known as the p-value, the probability 
that, if the null hypothesis is true, one obtains a value for the 
variable under consideration larger or equal than the observed 
one. By imposing a significance level a, the links that carry 
weights which can be considered not compatible with a ran- 
dom distribution can be filtered out with an certain statistical 
significance. All the links with aij < a reject the null hypoth- 
esis and can be considered as significant heterogeneities due 
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to the network organizing principles. By changing the signif- 
icance level we can filter out the links progressively focusing 
on more relevant edges. The statistically relevant edges will 
be those whose weight satisfy the relation 

a^J = 1 - (fc - 1) (1 - x)''~^dx < a. [2] 
Jo 

Note that this expression depends on the number of connec- 
tions k of the node to which the link under consideration is 
attached. 

The multi-scale backbone is then obtained by preserving 
all the links which satisfy the above criterion for at least one 
of the two nodes at the ends of the link while discounting the 
rest Q In this way, small nodes in terms of strength are not 
belittled so that the system remains in the percolated phase. 
In other words, we single out the relevant part of the net- 
work that carries the statistically relevant signal provided by 
the distribution with respect to a local uniform randomness 
null hypotheses. By choosing a constant significance level a 
we obtain a homogeneous criterion that allows us to compare 
inhomogeneities in nodes with different magnitude in degree 
and strength. Decreasing the statistical confidence more re- 
strictive subsets are obtained, giving place to a potential hi- 
erarchy of backbones. This strategy will be efficient whenever 
the level of heterogeneity is high and weights are locally cor- 
related. Otherwise, the pruning could lose its hierarchical 
attribute producing analogous results to the global threshold 
algorithm (see section "Networks with uncorrelated weights" 
in Supporting Information). 

The multiscale backbone of real networks. To test the per- 
formance of the disparity filter algorithm, we apply it to the 
extraction of the multiscale backbone of two real world net- 
works. We also compare the obtained results with the reduced 
networks obtained by applying a simple global threshold strat- 
egy that preserves connections above a given weight loc- As 
examples of strongly disordered networks, we consider the do- 
mestic non-stop segment of the U.S. airport transportation 
system for the year 2006 [T^ and the Florida Bay ecosystem 
in the dry season [20]. The U.S. airport transportation sys- 
tem for the year 2006 gathers the data reported by air carriers 
about flights between 1078 USA airports connected by 11890 
links. Weights are given by the number of passengers traveling 
the corresponding route in the year symmetrized to produce 
an undirected representation. The resulting graph has a high 
density of connections, (k) — 22, making difficult both its 
analysis and visualization. The Florida Bay foodweb comes 
from the ATLSS Project by the University of Maryland [2T] . 
Trophic interactions in food webs are symbolized by directed 
and weighted links representing carbon flows {mgCy~^m~^) 
between species. The network consists of a total of 122 sepa- 
rate components joined by 1799 directed links. 

In Table 1 and Fig. 1, we show statistics for the relative 
sizes -in terms of fractions of total weight Wt, nodes Nt, and 
edges Et- preserved in the backbones when the network is fil- 
tered by the disparity filter and by the application of a global 
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Fig. 1. Fraction of nodes kept in the backbones as a function of the fraction of 
weight (left) and edges (right) retained by the filters. 



threshold, respectively. The disparity filter reduces the num- 
ber of edges significantly even when the significance level a is 
close to 1, keeping at the same time almost all the weight and 
a high fraction of nodes. Smaller values of a reduce even more 
the number of edges but, interestingly, the total weight and 
number of nodes remain nearly constant. Only for very low 
values of a -when the filter becomes very restrictive- the total 
weight and number of nodes start decreasing significantly. In 
the case of the airports network, values around a ~ 0.05 ex- 
tract backbones with more than 80% of the total weight, 66% 
of nodes, and only 17% of edges. The global threshold filter, 
on the other hand, is not able to maintain the majority of the 
nodes in the backbone for similar values of retained weight or 
edges, as it is clearly seen in the first and second columns of 
Fig. 1, respectively. 

It is particularly interesting to analyze the behavior of the 
topological properties of the filtered network at increasing lev- 
els of reduction. Fig. 2 shows the evolution of the cumulative 
degree distribution, i. e. Pc{k) = 'Y^y^i^P(k'), for different 
values of a (left top plot) and uoc (right top plot), respectively. 
The original airports network is heavy tailed although cannot 
be fltted by a pure power law function. Interestingly, the dis- 
parity filter reveals a clear power law behavior as a decreases, 
with an exponent 7 « 2.3. On the other hand, the global 
threshold filter produces subgraphs with a degree distribution 
similar to the original one but with a sharp cut-off that be- 
comes smaller as the filter gets more restrictive. On the other 
side, the weight distribution P{uj) for the disparity filter (left 
middle plot) shows that almost all scales are kept during the 
filtering process and only the region of very small weights is 
affected, in contrast to the global threshold filter that, by def- 
inition, cuts -P(w) off below ujc (middle right plot). 

In the bottom plots of Fig. 2, we show the clustering coef- 
ficient C measured as the average over nodes of degree larger 
than 1. It remains nearly constant in both filters until they 
become too restrictive, in which case clustering goes to zercrl 
In the case of the disparity filter, clustering remains constant 



In the case of a node i of degree fej — 1 connected to a node j of degree kj > 1, we keep the 
connection only if it beats the threshold for node j 

^The sudden increase of clustering for j E'j' — 0.2 is due to the reduction of the number of 
nodes in the network, increasing then the chances of having a random contribution. 
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Fig. 3. Topology of the filtered subgraphs for the U.S. airports Network. Top: 
Cumulative degree distribution, Pc{k), for the disparity (left) and global threshold 
(right) backbones. The values of uJc on the right plot are chosen to generate sub- 
graphs w/ith the same weight as the ones shown on the left plot. Middle. Distribution 
of links' weights of the different subgraphs generated by the two filters. Symbols are 
the same as in the top plots. Bottom. Clustering coefficient averaged over nodes of 
degree larger than 1 for the two methods as a function of the fraction of edges in the 
backbones. Dashed lines show the fraction of nodes and weight for a given fraction 
of edges. 



up to values of a « 0.01. This is precisely the value below 
which both the number of nodes and the weight in the back- 
bone start decreasing significantly. Therefore, we can con- 
clude that values of a in the range [0.01,0.5] are optimal, in 
the sense that backbones in this region have a large proportion 
of nodes and weight, the same clustering of the original net- 
work, and a stable stationary degree distribution, all with a 
very small number of connections as compared to the original 
network. It is important to stress that the disparity filtering 
also includes the connections with the largest weight present 
in the system. This is because the heavy-tail of the P{uj) dis- 
tribution is mainly determined by relevant large-scale weight. 
This is clearly illustrated in Fig. 3, where we show that for 
statistical significance levels up to a ~ 10~^, all the edges 
included in the 10-20% of the P{u}) tail are included in the 
extracted multiscale backbone. 

As an illustration of the efficacy of the disparity filter, we 
visualize the obtained multi-scale backbone in Fig. 4. In the 
case of the US airport network we use the significance value 
a — 0.003 (see entry (b) in Table 1 and Fig. 3). Interest- 
ingly, the disparity filter offers a perspective of the network 
that reveals its geographic constrains (notice that each node 
is placed in the plane according to its actual coordinates on 
the earth). It is possible to identify local hubs with very well 
defined basins of attraction made of small airports connected 
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Fig. 4. Fraction of edges in different Global Threshold backbones (GTB) included 
in the Disparity backbone (DB) as a function of the significance level. As shown, 
points a and b in the US airport network mark Disparity backbones including a 100% 
of the 40-W and 10-W Global Threshold backbones, respectively; points a and b in 
the Florida Bay food web mark Disparity backbones including a 100% of the 40-W 
and 13-W Global Threshold backbones, respectively. See also Table 1. 



to them [5^, a star-like pattern that is particularly clear in 
Alaska airports or mid west cities. In addition, the hierarchy 
of the transportation system is fully highlighted, including 
not just the most high flux connections but also small weight 
edges which are statistically significant as they represent rele- 
vant signal at the small scales. In this way, all important con- 
nection on the local and global level are considered at once. 
This would not be possible with a global threshold algorithm, 
that would simply eliminate all connections below the scale 
introduced by the cut-off threshold. 

The Florida Bay food web is a directed network (see Sup- 
plementary Information for an explanation of the methodol- 
ogy in the case of weighted directed neworks). We draw its 
multiscale backbone for a = 0.0008, which contains the top 
40% of heaviest links (see entry (a) in Table 1 and Fig. 3). 
Notice that, in this case, the concentration of weight in a few 
links is so important that the represented disparity backbone 
contains approximately half of the total weight in the net- 
work. Again, star motifs are uncovered, formed by mainly 
incoming connections -like for the pelican- or mainly out- 
going ones -bivalves. More in general, specific subsystems 
dominated by significant fluxes can be easily identified, which 
might be an evidence of a historical evolution of the network 
from smaller modular and disconnected structures to the com- 
plete ecosystem we observe today. Another interesting remark 
refers the presence in the backbone of species with relatively 
few trophic links. Species with few connections are usually 
assumed to have a low impact on the ecosystems. However, 
counterexamples can be found and such species may act as 
the structural equivalent of keystone species, whereas species 
with many trophic linkages may be more conceptually simi- 
lar to dominant species [25. Due to its local approach, our 
filter mixes both types in the backbones, where simultane- 
ously coexist big hubs -like the Predatory Shrimp, which in 
the complete network approximately has an average number 
of incoming connections and the maximum number of outgo- 
ing ones, 13 and 61 respectively- with more modest species in 
terms of connections -like Benthic Flagellates, with in-degree 
1 and out-degree 10, both below the average. 

Conclusions. The disparity filter exploits local heterogeneity 
and local correlations among weights to extract the network 
backbone by considering the relevant edges at all the scales 
present in the system. The methodology preserves an edge 
whenever its intensity is a statistically not compatible with re- 
spect to a null hypothesis of uniform randomness for at least 
one of the two nodes the edge is incident to, which ensures 
that small nodes in terms of strength are not neglected. As 



4 I 



^ Minneapolis « 




Water 
Flagellates 

2S 

Phytoplankt 



Herbivorous Hawksbill Turtle 
Sailfin'^'^* Sponges R|ys Rainwater killifish 
Molly ^ 
Epiphytes'^ 



Bonefish Benthic „ 




Oithona nana 
Acartia Tonsa 



Benthic 
leiofauna''PWoplankton 



Scianids 
Flatfish 
Porgy 
'Toadfish 
Grunt 
ittier Horsefisti 



Parrotfisti 

Pinfisti Hallbeaks 
AnctiDvy 



Fig. 5. Pajek representations |22| of disparity backbones. Top. The Q = 0.003 multiscale backbone of the 2006 domestic segment of the U.S. airport transportation 
system. This disparity backbone includes entirely the top 10% of the heaviest edges. Bottom. The ct = 0.0008 multiscale backbone of the Florida Bay ecosystem in the dry 
season. This disparity backbone includes entirely the top 40% of the heaviest edges. These disparity backbones correspond to points (b) for the US airport network and (a) 
for the Florida Bay food web in Table 1 and Fig. 3. The connection with maximum weight for the US airport network is Atlanta-Orlando, with value Umax = 1; 290, 488 
pasengers / year and for the Florida Bay Food Web Free Bacteria to Water Flagellates with value uJmax = 12.90 mgCy~^m^^ . 



a result, the disparity filter reduces the number of edges in 
the original network significantly keeping, at the same time, 
almost all the weight and a large fraction of nodes. As well, 
this filter preserves the cut-off of the degree distribution, the 
form of the weight distribution, and the clustering coefficient. 

As a criticism, one could say that it only works in the 
case of systems with strong disorder, where the weights are 
heterogeneously distributed both at the global and local level. 
Nevertheless, all filters present limitations, one has to take 
them into account in relation to the problem under analysis. 
Which strategy is the most appropriate for a particular prob- 
lem should be carefully judged and we cannot exclude the 
possibility that a combination of different techniques turns 
out to be the most appropriate. Yet, the ubiquitous pres- 
ence of fluctuations and disorder spanning many length scales 
uncovered in many real networks provides a wide range of 
potential applications for the present methodology in biology 
(metabolic networks, brain, periodically regulated genes), in- 
formation technology (Internet, World Wide Web), economics 
(World Trade Web) and finance (stocks markets). 

Materials and Methods 

Local heterogeneity of edges' weight. In order to asses the effect of inhomo- 
geneities in the weights at the local level, for each node i with k neighbors one can 
calculate the function I17||5] 



T^{k) = kY,{k) = kY,P%- [3] 

j 



The function Yi(^k) has been extensively used in several fields as a standard indica- 
tor of concentration for more than half a century: in Ecology 1251, Economics |26l , 
Physics 27 and recently in the Complex Networks literature where it is known as 
the disparity measure '17]. In all cases, Y'ii^k') characterizes the level of local het- 
erogeneity. Under perfect homogeneity, when all the links share the same amount 
of the strength of the node, equals 1 independently of k, while in the case 

of perfect heterogeneity, when just one of the links carries the whole strength of the 
node, this function is Ti(fc) = k. An intermediate behavior is usually observed in 
real systems with OC k^ and the exponent close to 1/2. In this case, the 

weights associated to a node are then peaked on a small number of links with the 
remaining connections carrying just a small fraction of the node's strength. This is 



the situation where our filter will be more useful, highlighting structures impossible 
to detect using the global threshold filter. In this way, the disparity function can be 
used as a preliminary indicator of the presence of local heterogeneities. 
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Fig. 2. Top sketch. Sequential diagram illustrating the disparity filtering tech- 
nique at the local level. We focus on the central node in orange and its first neigh- 
borhood, a) original network; b) edges of the central node with weights that are 
statistically significant heterogeneity; c) the same for the neighbors; d) intersection 
of the colored edges in b) an c) that are finally selected in the backbone. Middle 
graphs. Distribution of link's weights spanning for six decades. Even though this 
distribution does not have a clear functional form, a direct power law fit of the form 
uj^^ yields an exponent /3 = 1.1, so with a diverging first moment. Bottom 
graphs. Scattered plot of the disparity measure for individuals airports of the US 
airport network. The grey area corresponds to the average plus 2 standard deviations 
given by the null model. 



The null model. The probability density function of Eq. J^, along with the join 
probability distribution for two intervals given by 

p{x,y)dxdy = {k-l){k-2){l-x-y)^'^Q{l-x~y)dxdy, [4] 



where ©(■) is the Heaviside step function, can be used to calculate the statistics of 
Tnull{k) for the null model. The average /i(T„un(fc)) = kfl{Ynull{k)) 
and the variance (t'^ [T null {k)) = k^ (j'^ (Ynull{k)) are found to be: 
2k 

M(T"„„n(fc)) = ,^ , , [5] 



20 + 4fc 



it + 1 

a^(T„„„(fc)) = ^'((-fc + i)(fc + 2)(fc + 3) ik + 

Notice that the two moments depend on the degree k so that each node in the net- 
work with a certain degree k should be compared to the corresponding null model. 

The observed values Tob{k) compatible with the null hypothesis could be de- 
fined as those in the region between {Tnull {k)) -\-a- (7 {Tnull {k)) and perfect 
homogeneity, so that local heterogeneity will be recognized only if the observed values 
lie outside this area, 

^ob{k) > fJ^(Tnuii{k)) + a- a (Tnuii{k)) . [7] 



The variable Ct. is a constant determining the confidence interval for the evaluation 
of the null hypothesis. The larger it is the more restrictive becomes the null model 
and the more disordered weights should be for local heterogeneity to be detected. A 
typical value in analogy to gaussian statistics could be for instance a = 2. 

As shown in Fig. 5, the overall distributions of weights for both networks con- 
sidered here are very broad, with tails approaching power-law behaviors spanning six 
decades for the U.S. airport network and more than four for the Florida Bay food 
web. At the local level, T(^k^ measurements cannot be explained by the null model 
for most nodes. 
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SUPPORTING INFORMATION 

The disparity filter for directed weighted networks 

In many systems, interactions between pairs of elements are 
asymmetric, running partial or totally in one of the two pos- 
sible directions. Noticeable examples are the World Wide 
Web [l], email networks [2], citation networks [3], genetic and 
metabolic networks [4] [5] , or economic networks such as the 
World Trade Web [6], among others. The undirected net- 
work representation becomes then a first order approxima- 
tion that can be refined by representing the connections as 
arrows, indicating the source node at the tail and the desti- 
nation node at the head. In this way, directed network rep- 
resentations are more complete and convey more information 
about the system when directionality of the interactions is 
relevant. This increase of information content is reflected at 
the simplest level even in the description of the nodes' con- 
nectivities, so that each vertex has to be described by two 
coexisting degrees k^" and representing the number of 
incoming neighbors pointing to it and the number of outgoing 
neighbors pointed by it respectively, which sum up to the to- 
tal degree k = A:"* -I- fc""*. Hence, the degree distribution for 
a directed network is a joint degree distribution P(A;*", fc°"*) 
of in- and out-degrees, which in general may be correlated. In 
the following, we assume they are not. 

Our filtering methodology to extract the backbone of rel- 
evant connections in complex multiscale networks can be ex- 
tended to weighted directed networks. In this type of repre- 
sentations, the total strength Si associated to a certain node 
i has two contributions coming from the incoming strength 
si" and the outgoing strength s°"*, which are obtained by 
summing up all the weights of the incoming or outgoing links 
respectively. The normalized weights of edges linking node i 
with its neighbors are calculated as p]" = wlj / s]" if the link 
corresponds to an incoming connection, and p""' = w'ij'^ /s°"'* 
if it is associated to an outgoing one, being w]" the weight of 
the incoming connection to its neighbor j and w™* the weight 
of the outgoing one. Take into account that the incoming con- 
nection from the point of view of the head node is at the same 
time an outgoing connection of the tail node. 

The strategy in this case is as before based on the detec- 
tion local heterogeneities. The goal is to preserve the edges 
carrying a weight that represents a local significant deviation 
with respect to a statistical null model for the local assignment 
of weights by using the disparity function. But this time with 
the condition that incoming and outgoing links associated to a 
node must be considered separately. For each node i with fc™ 
incoming neighbors and outgoing ones, one can calculate 
the functions 

T,(fe") = k'"Y,{k'") = k"^J2(p'ijf' W 

3 

T,(fe°"*) = r^V^r"*) = A:°"*^(pyOut)^ [9] 

j 

Yi{k^") characterizes the level of local heterogeneity in the 
incoming weights while yi(fc°"') correspond to the outgoing 
counterpart. As happens in the undirected case, under per- 
fect homogeneity, when all the incoming (outgoing) links share 
the same amount of the incoming (outgoing) strength of the 
node, Ti(F") (T,(fe°"')) equals 1 independently of fc*" (ifc°"*), 
while in the case of perfect heterogeneity, when just one of the 
incoming (outgoing) links carries the whole incoming (outgo- 
ing) strength of the node, this function is equal to fc'" (fc°"*). 
An intermediate power law behavior is usually observed in 
real systems indicating that the incoming (outgoing) weights 



associated to a node are peaked on a small number of links 
with the remaining connections carrying just a small fraction 
of the node's incoming (outgoing) strength. This is the situ- 
ation where our filter will be more useful, highlighting struc- 
tures impossible to detect using the global threshold filter. In 
this way, the disparity function can be used as a preliminary 
indicator of the presence of local heterogeneities. 

The null model. The null model that we use to define anoma- 
lous fiuctuations of weights in directed networks with strong 
disorder provides the expectation for the disparity measures 
above in a pure random case. The null hypothesis is made in- 
dependently for the set of incoming and outgoing connections 
and is the same as in the undirected case. It assumes that the 
normalized weights which correspond to the incoming (outgo- 
ing) connections of a certain node of in-degree fc*" (fc°"') are 
produced by a uniform random assignment. To visualize this 
process, fc™ — 1 (fc°"* — 1) points are distributed with uniform 
probability in the interval [0, 1] so that it ends up divided in 
fc™ (fc°"') subintervals. Their lengths would represent the ex- 
pected values for the fc'" (fc°"*) normalized weights pi" {pt'j^) 
according to the null hypothesis. The incoming and outgoing 
probability density functions for one of these variables taking 
a particular value x is 

p{x)dx = {K-l){l~x)"'~'^dx, [10] 

where k stands for fc'" or fc°"* as the fluctuations in incoming 
or outgoing intensities are being evaluated. This probability 
density function, along with the join probability distribution 
for two intervals given by 

p{x, y)dxdy — (k — 1)(k — 2)(1 — a: — y)"~^0(l ~ x ~ y)dxdy, 

[11] 

where O(-) is the Heaviside step function, can be used to 
calculate the statistics of T„„a(fc"') and T„u;;(fc°"*) for the 
null model. The averages /i(T„„n(«:)) = K/i(y„un(K)) and the 
standard deviations cr^(T,j„;;('^)) = K^CT^(y„uii(«;)) are found 
to be: 

/i(T„„„(^))= [12] 
o-^(T„„h(«:)) = ( {„+iK°t2KK+3) ~ ) ■ [■'■^] 

Notice that the two moments depend on the incoming or out- 
going degree k so that each node in the network with a certain 
fc™ and should be compared to the corresponding func- 
tions. 

In real or modeled networks, the disparities can be di- 
rectly observed and the functions Tot,(fc'") and Toi,(fc°"*) can 
be compared against the null model expectations. Values com- 
patible with the null hypotheses could be deflned as those in 
the region between {T„uii{ii)) + a ■ cr(T„„;;(K)) and perfect 
homogeneity, so that local heterogeneity will be recognized 
only if the observed values lie outside this area, 

Toi,(K) > fi{Tnuii{ii)) + a - a (T„„ii(K)) . [14] 

The parameter a is a constant determining the confldence in- 
terval for the evaluation of the null hypothesis. The larger it 
is the more restrictive becomes the null model and the more 
disordered weights should be for local heterogeneity to be de- 
tected. A typical value in analogy to gaussian statistics could 
be for instance a = 2. In this way, it is possible to characterize 
quantitatively the level of disorder observed in the distribu- 
tion of weights in incoming and outgoing links. Specially when 
this disorder is high, our disparity filtering technique allows 
us to extract the backbone of relevant directed connections. 



The disparity filter. The disparity filter proceeds by identify- 
ing wliicli incoming and outgoing links for each node should be 
preserved in the network. The null model allows this discrim- 
ination by the calculation for each incoming (outgoing) edge 
of a a given node i of the corresponding probability a^" (ct°j"') 
that its normalized weight pl'j {p°j'^) is compatible with the 
null hypothesis. In statistical inference, this concept is known 
as the p-value, the probability that if the null hypothesis is 
true one obtains an o value for the variable under consider- 
ation larger or equal than the observed one. By imposing a 
significance level a, the incoming (outgoing) links that carry 
weights which can be considered not compatible with a ran- 
dom distribution can be filtered out with an certain statistical 
significance. All the incoming (outgoing) links with a]" < a 
(a°"' < a) reject the null hypothesis and can be considered as 
significant heterogeneities. By changing the significance level 
we can filter out the incoming (outgoing) links progressively 
focusing on more relevant heterogeneities. Statistically signif- 
icant inhomogeneous weights will be then those which satisfy 



= 1 - (fc'" - 1) 



(1 — x)* ^dx < a, 



(r 



1) 



(l-x)' 



dx < a. 



[151 



[161 



Note that these expressions are cal cula ted as a function of the 
probability density function Eq. 
the number of connections A;'' 
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and again depend on 
or fc°"* of the node to which 
the directed link under consideration is attached. 

The multi-scale backbone of weighted directed networks 
is then obtained by preserving all the incoming and outgoing 
links which beat the threshold for at least one of the two nodes 
at the ends of the link while discounting the rest. Notice that 
an outgoing connection for the tail node is an incoming con- 
nection for the head one, so the outgoing connections and the 
appropriate null model should be considered for the first while 
incoming connections and the corresponding null model for 
the second. In the case of a node i with out-degree — 1 
connected to a node j with in-degree kj" > 1, we keep the 
connection only if it beats the threshold for the in-null model 
of node j, while if the in-degree of node i kY^ — 1 and it is 
connected to a node j of out-degree > 1, we keep the con- 
nection only if it beats the threshold for the out-null model 
of node j. In this way, relevant fluctuations at all scales are 
selected and small nodes in terms of strength are not belittled 
so that the system remains in the percolated phase. Finally, 
in the rare case than node i has out-degree — 1 and in- 
degree kl" > 1 and is connected to a node j with in-degree 



1 and out-degree k°^ > 1, we keep the connection as it 



is the only way to maintain the connectivity of the network. 

By choosing a constant significance level a we obtain a ho- 
mogeneous criteria that allows us to compare inhomogeneities 
in nodes with different magnitude in connections and strength. 
Decreasing the statistical confidence more restrictive subsets 
are obtained, giving place to a potential hierarchy of back- 
bones. This strategy will be efficient whenever the level of 
heterogeneity is high. Otherwise, the pruning could lose its 
hierarchical attribute. 



dependently at random from -P(cj). Distributions with a well 
defined average could be homogeneous distribution, where all 
weights fluctuate around a characteristic value, but could also 
be highly heterogeneous ones, for instance those with power- 
law form with exponent larger than two. 

Next, we prove analytically -for undirected networks al- 
though the same reasoning is also valid for directed ones- the 
approximate equivalence of the two models for a certain rela- 
tion between the significance level a and the global threshold 
LUc, that we derive. More specifically, we demonstrate that 
the probability for a given edge of weight ujij connected to a 
node i of degree k of remaining in the disparity-filtered net- 
work S{ujij\k) is the same as that of remaining in the globally 
thresholded one 0{iJij —uuc), where 0{-) is the Heaviside step 
function. Henceforth, we generally refer to these probabilities 
as survival probabilities. 

From Eq. [2] in the main text, the disparity filter keeps 
those edges with weights uJij > (a~^^^'"~^^ — 1) X^j.^^ i^u- The 
disparity filter survival probability can thus be expressed as 



S{uJij\k) 



/■•■/e|^..,,-(a*i)-i)gc.«^ 



■Y[P{uJu)duu. [17] 

In the previous equation, we have taken into account that, 
for this particular model, weights are uncorrelated and that 
for every edge the weight is identically and independently 
distributed according to -P(i^). Calculations are very much 
simplified in the Laplace space where, generically, we de- 
fine the Laplace transform of a function /(w) as f(u ) = 
f{ijj)e~^'^diu. Using this transformation, equation 1 17 1 
reads 



For large degrees, one can make the approximation 

P \u{a-~'-'^''~^^ - 1)] ~ P [u\na-^/{k - 1)] , [ 



181 



191 



and truncating the Taylor series expansion to the first order 



P [^(a-'/'*"'' 1)] ~ 1 - {u)u\na~^/{k - 1). [20] 



Substituting this into Eq. 18 , 
S{u\k) ~ 



l-(^) 



- (oj) TJ, In 



ulna 



[21] 



Networks with uncorrelated weights 

The disparity filter and the global threshold strategy give sim- 
ilar results when applied to a complex network with uncorre- 
lated weights, whenever their probability distribution P{uo) 
has a well defined average. From a practical point of view, a 
network with uncorrelated weights can be easily realized by 
assigning to each edge of the network an intensity drawn in- 



Notice that this expression has lost any dependence on the 
vertex degree k. Finally, inverting the Laplace transforma- 
tion 

S{Lo,j\k)c^Q{Lo^j-{Lo)\na-''). [22] 
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Fig. 6. Fraction of nodes and edges as a function of the fraction of total weight 
retained by the global and disparity filters acting on the airport network with a random 
assignment of weights according to the distribution P{uj) OC UJ~'^'^ . 

Hence, the survival probability under the disparity filter 
with significance level a is approximately equal to the sur- 
vival probability under the global threshold for a threshold 
value ujc ~ {<^) Ina"'^, independent of the degree k. Figure 1 
shows the result of both filters on the airport network with a 
random assignment of weights to edges. In this case, we use 
P{uj) OC uj~'^ with P — 2.5. As it is clearly visible, both filters 
give very similar results, in agreement with the calculations 
above. 

Unbounded average. Notice that if the average of P{u!) is un- 
bounded, the previous relation is not well defined. However, 
this is the case of most real networks, that are characterized 
by a weight distribution that is power-law with exponent less 
than two, so that its first moment diverges. In this situation, 
the equivalence of the two methodologies does not hold. This 
is mainly due to the symmetry breaking that we impose on the 
filtering condition when we consider that the same intensity w 
may be relevant in a different way if considered as associated 
to uJij and Uji. Each edge is incident to two nodes; while the 
weight carried by the edge may not be a relevant fluctuation 
for one node (for instance a node with several other links with 
large weight) it could be a relevant fluctuation for the other 
node. This is what allows us to preserve relevant fluctuations 
at different scales and providing a backbone including nodes 
handling a total weight of very different magnitude. 



For this reason, instead of considering weights directly, our 
methodology works with the normalized weights pij — uiij / Si 
and pji = ijJij/ Sj as independent quantities. One might want 
to enforce symmetry by imposing a rule AND instead of the 
rule OR that we have chosen, so that a connection is preserved 
whenever its intensity is significant for both nodes involved. 
However, the rule OR in the disparity filter, that we prefer 
because it ensures that small nodes in terms of strength are 
not belittle, only demands that the connection is important 
for one of the two. Remember that in networks where weights 
are not correlated there is a relation between the strength s 
of nodes and the average weight in the network of the form 
s ~ k{ijj). If the average is not well defined, the strength of 
nodes can fluctuate wildly so that the same weight can be ex- 
perienced as extremely important or unimportant depending 
on the node and, as a consequence, the rules AND and OR 
produce very different results. 

In Fig. 2, we show the effect of considering the disparity 
fllter with rules AND and OR on networks with uncorrelated 
weights with unbounded average. The AND disparity filter is 
qualitatively very similar to the global threshold algorithm re- 
garding number of preserved nodes and edges, while the OR 
disparity filter maintains a similar number of edges with a 
much larger number of nodes. 



— % of nodes disparity filter .OR. rule 




% of weight in bacl<bone 

Fig. 7. Fraction of nodes and edges as a function of the fraction of total weight 
retained by the global and disparity filters (with .OR. and .AND. rules) acting on the 
airport network with reshuffled weights. 
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